Clustering performance comparison using K-means and expectation maximization algorithms

نویسندگان

  • Yong Gyu Jung
  • Min Soo Kang
  • Jun Heo
چکیده

Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K-means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K-means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Clustering Approaches for Gene Expression Data

Clustering algorithms have been used to divide genes into groups according to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently indicates that the genes could possibly share a common biological role. In this paper, four clustering algorithms are investigated: k-means, cut-clustering, spectral and...

متن کامل

Increasing the Accuracy of Recommender Systems Using the Combination of K-Means and Differential Evolution Algorithms

Recommender systems are the systems that try to make recommendations to each user based on performance, personal tastes, user behaviors, and the context that match their personal preferences and help them in the decision-making process. One of the most important subjects regarding these systems is to increase the system accuracy which means how much the recommendations are close to the user int...

متن کامل

Sample-weighted clustering methods

Keywords: Cluster analysis Maximum entropy principle k-means Fuzzy c-means Sample weights Robustness a b s t r a c t Although there have been many researches on cluster analysis considering feature (or variable) weights, little effort has been made regarding sample weights in clustering. In practice, not every sample in a data set has the same importance in cluster analysis. Therefore, it is in...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Bayesian K-Means as a “Maximization-Expectation” Algorithm

We introduce a new class of “maximization expectation” (ME) algorithms where we maximize over hidden variables but marginalize over random parameters. This reverses the roles of expectation and maximization in the classical EM algorithm. In the context of clustering, we argue that these hard assignments open the door to very fast implementations based on data-structures such as kdtrees and cong...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2014